Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches

نویسندگان

  • Yuya Unno
  • Takashi Ninomiya
  • Yusuke Miyao
  • Jun'ichi Tsujii
چکیده

Sentence compression is a task of creating a short grammatical sentence by removing extraneous words or phrases from an original sentence while preserving its meaning. Existing methods learn statistics on trimming context-free grammar (CFG) rules. However, these methods sometimes eliminate the original meaning by incorrectly removing important parts of sentences, because trimming probabilities only depend on parents’ and daughters’ non-terminals in applied CFG rules. We apply a maximum entropy model to the above method. Our method can easily include various features, for example, other parts of a parse tree or words the sentences contain. We evaluated the method using manually compressed sentences and human judgments. We found that our method produced more grammatical and informative compressed sentences than other methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Parse Bilingual Sentences Using Bilingual Corpus and Monolingual CFG

Abstract We present a new method for learning to parse a bilingual sentence using Inversion Transduction Grammar trained on a parallel corpus and a monolingual treebank. The method produces a parse tree for a bilingual sentence, showing the shared syntactic structures of individual sentence and the differences of word order within a syntactic structure. The method involves estimating lexical tr...

متن کامل

A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to...

متن کامل

Automatic Synthesis of Semantics for Context-free Grammars

We are investigating the mechanical transformation of an unambiguous context-free grammar (CFG) into a deenite-clause grammar (DCG) using a nite set of examples, each of which is a pair hs; mi, where s is a sentence belonging to the language deened by the CFG and m is a semantic representation (meaning) of s. The resulting DCG would be such that it can be executed (by the interpreter of a logic...

متن کامل

Improving Multi-documents Summarization by Sentence Compression based on Expanded Constituent Parse Trees

In this paper, we focus on the problem of using sentence compression techniques to improve multi-document summarization. We propose an innovative sentence compression method by considering every node in the constituent parse tree and deciding its status – remove or retain. Integer liner programming with discriminative training is used to solve the problem. Under this model, we incorporate vario...

متن کامل

Machine learning of syntactic parse trees for search and classification of text

We build an open-source toolkit which implements deterministic learning to support search and text classification tasks. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common subtrees and performed at a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006